Picture for Ranjie Duan

Ranjie Duan

Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models

Add code
Feb 08, 2026
Viaarxiv icon

YuFeng-XGuard: A Reasoning-Centric, Interpretable, and Flexible Guardrail Model for Large Language Models

Add code
Jan 22, 2026
Viaarxiv icon

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Add code
Jan 16, 2026
Viaarxiv icon

Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation

Add code
Jun 10, 2025
Viaarxiv icon

The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework

Add code
May 25, 2025
Figure 1 for The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
Figure 2 for The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
Figure 3 for The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
Figure 4 for The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
Viaarxiv icon

Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning

Add code
May 23, 2025
Viaarxiv icon

DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

Add code
Apr 25, 2025
Figure 1 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Figure 2 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Figure 3 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Figure 4 for DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Viaarxiv icon

STAIR: Improving Safety Alignment with Introspective Reasoning

Add code
Feb 04, 2025
Figure 1 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 2 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 3 for STAIR: Improving Safety Alignment with Introspective Reasoning
Figure 4 for STAIR: Improving Safety Alignment with Introspective Reasoning
Viaarxiv icon

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Add code
Jan 25, 2025
Figure 1 for Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Figure 2 for Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Figure 3 for Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Figure 4 for Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Viaarxiv icon

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Add code
Jan 09, 2025
Figure 1 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 2 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 3 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 4 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Viaarxiv icon